Public Internet Correlation between Population Density or Schools
DSAN 6750 / PPOL 6805: GIS for Spatial Data Science
Author
Affiliation
Gabriel Soto
Georgetown University
Introduction
Do public schools get better access to public internet? This research analyses whether public schools in Panama, get access to public internet via access points - Google Satellite - Migration Panama
meter data temporal de uso de datos por access points
agregar datos de cantidad de estudiantes por distrito https://www.inec.gob.pa/archivos/P030194820231213142523Cuadro%2021.pdf
agregar un indice normalizado (ya que el numero es muy chiquito) de la cantidad de casas sin internet https://www.inec.gob.pa/archivos/P0705547520240202111515Cuadro%201.pdf
Future: - indicadores socioeconomicos a nivel de distrito https://www.inec.gob.pa/archivos/P0579518620240202083001Cuadro%204.pdf
dijo:
It’s the remoteness of Oyala that makes it so appealing to President Obiang. In a rare interview he described how rebels had recently plotted a seaborne assault on his palace in the current capital, Malabo. ‘We need a secure place for my government and for future governments. That’s why we have created Oyala, to guarantee the government of Equatorial Guinea.’ (Sackur 2012)
This case is far from exceptional, as an even more recent Washington Post article points out with respect to Myanmar’s decision to move its capital from Yangon to Naypyidaw:
Analysts have described the decision as motivated by a desire to secure the military’s seat of power from any threat of protests or invasions. (Berger 2021)
Most of these studies, however, are based on observations of conflict events. In this study, we study the more fundamental variable of a capital’s distance from the population centroid of the country.
Literature Review
Campante, Do, and Guimaraes (2019) analyzes the relationship between the location of a capital city and the degree of conflict and misgovernance in a given country. Their two key findings are that:
Conflict is more likely to emerge (and dislodge incumbents) closer to the capital
and
Isolated capitals are associated with misgovernance.
This first finding is illustrated in ?@fig-conflict-dist
Code
#librarylibrary(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
Code
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
library(stringr)library(ggplot2)library(tmap)
Warning: package 'tmap' was built under R version 4.4.2
Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
remotes::install_github('r-tmap/tmap')
Code
library(readr)library(DT)
Warning: package 'DT' was built under R version 4.4.2
Code
library(skimr)
Warning: package 'skimr' was built under R version 4.4.2
Code
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Code
library(leaflet)
Code
#functionsconvert_dms_to_decimal <-function(dms_str) {# Extract components parts <-str_match(dms_str, "(\\d+)° (\\d+)' (\\d+\\.?\\d*)\"\" ([N|S|E|W])")# Convert components to numeric degrees <-as.numeric(parts[, 2]) minutes <-as.numeric(parts[, 3]) seconds <-as.numeric(parts[, 4])# Calculate decimal degrees decimal <- degrees + minutes/60+ seconds/3600# Apply sign based on direction direction <- parts[, 5] decimal <-ifelse(direction %in%c("S", "W"), -decimal, decimal)return(decimal)}
#Cleaning data# Convert your coordinatesschools_csv$decimal_lat <-convert_dms_to_decimal(schools_csv$lat)schools_csv$decimal_long <-convert_dms_to_decimal(schools_csv$long)#removing long, lat columns and creating schools_dfschools_df <- schools_csv %>%select(-lat) %>%select(-long)# Create SF object with converted coordinates from schools_dfschools_data <-st_as_sf(schools_df, coords =c("decimal_long", "decimal_lat"),crs =4326)
datatable(district_summary,colnames =c("District", "Number of Schools", "Number of Access Points", "Population", "AP per School", "AP per 1000 people"),options =list(pageLength =20))
A marker object has been specified, but markers is not in the mode
Adding markers to the mode...
Code
library(plotly)# Create side-by-side plotssubplot(# Left plot: Access Points vs Populationplot_ly(district_summary, x =~population, y =~access_points,type ='scatter',mode ='markers',marker =list(size =10,color ='rgb(49,130,189)',opacity =0.7,line =list(color ='rgb(27,71,105)', width =1) ),text =~district,hovertemplate =paste('<b>District:</b> %{text}<br>','<b>Population:</b> %{x:,}<br>','<b>Access Points:</b> %{y}<br>','<extra></extra>' ) ) %>%add_lines(x =~population,y =~fitted(lm(access_points ~ population, data = district_summary)),line =list(color ='rgb(219,64,82)',width =2,dash ='dash'),showlegend =TRUE,name ='Access Points Trend') %>%layout(title ="Access Points vs Population",xaxis =list(title ="Population",gridcolor ='rgb(255,255,255)',gridwidth =2,showgrid =TRUE,zeroline =FALSE ),yaxis =list(title ="Number of Access Points",gridcolor ='rgb(255,255,255)',gridwidth =2,showgrid =TRUE,zeroline =FALSE ),paper_bgcolor ='rgb(251,251,251)',plot_bgcolor ='rgb(251,251,251)' ),# Right plot: Access Points per Schoolplot_ly(district_summary, x =~population, y =~ap_per_school,type ='scatter',mode ='markers',marker =list(size =10,color ='rgb(50,205,50)',opacity =0.7,line =list(color ='rgb(0,100,0)', width =1) ),text =~district,hovertemplate =paste('<b>District:</b> %{text}<br>','<b>Population:</b> %{x:,}<br>','<b>Access Points per School:</b> %{y:.2f}<br>','<b>Total Schools:</b> ', district_summary$schools, '<br>','<extra></extra>' ) ) %>%add_lines(x =~population,y =~fitted(lm(ap_per_school ~ population, data = district_summary)),line =list(color ='rgb(255,69,0)',width =2,dash ='dash'),showlegend =TRUE,name ='AP per School Trend') %>%layout(title ="Access Points per School vs Population",xaxis =list(title ="Population",gridcolor ='rgb(255,255,255)',gridwidth =2,showgrid =TRUE,zeroline =FALSE ),yaxis =list(title ="Access Points per School",gridcolor ='rgb(255,255,255)',gridwidth =2,showgrid =TRUE,zeroline =FALSE ),paper_bgcolor ='rgb(251,251,251)',plot_bgcolor ='rgb(251,251,251)' ),# Subplot layoutnrows =1,shareX =TRUE,titleX =TRUE)
A marker object has been specified, but markers is not in the mode
Adding markers to the mode...
A marker object has been specified, but markers is not in the mode
Adding markers to the mode...
Code
# Run linear regressionmodel <-lm(access_points ~ population, data = district_summary)# View summary of regression resultssummary(model)
Call:
lm(formula = access_points ~ population, data = district_summary)
Residuals:
Min 1Q Median 3Q Max
-41.413 -4.688 -2.250 3.524 34.859
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.874e+00 1.302e+00 3.743 0.00036 ***
population 2.422e-04 8.992e-06 26.934 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.54 on 73 degrees of freedom
Multiple R-squared: 0.9086, Adjusted R-squared: 0.9073
F-statistic: 725.5 on 1 and 73 DF, p-value: < 2.2e-16
Code
# For a more detailed analysis, we can also include additional variablesmodel_extended <-lm(access_points ~ population + schools, data = district_summary)summary(model_extended)
Call:
lm(formula = access_points ~ population + schools, data = district_summary)
Residuals:
Min 1Q Median 3Q Max
-40.368 -3.912 0.572 4.103 19.345
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.568e+00 1.772e+00 -1.449 0.152
population 2.058e-04 1.021e-05 20.171 < 2e-16 ***
schools 2.162e-01 4.019e-02 5.379 8.89e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.962 on 72 degrees of freedom
Multiple R-squared: 0.9348, Adjusted R-squared: 0.933
F-statistic: 516 on 2 and 72 DF, p-value: < 2.2e-16
Code
# Create the mapap_map <-leaflet(ap_data) %>%# Add base map tilesaddProviderTiles(providers$CartoDB.Positron) %>%# Add access pointsaddCircleMarkers(radius =5,color ="blue",fillColor ="blue",fillOpacity =0.6,stroke =TRUE,weight =1,popup =~paste("<strong>Location:</strong>", name, "<br>","<strong>Province:</strong>", province, "<br>","<strong>District:</strong>", district, "<br>","<strong>County:</strong>", county, "<br>","<strong>Type:</strong>", type, "<br>","<strong>Date:</strong>", date ),# Enable clustering to handle many pointsclusterOptions =markerClusterOptions() ) %>%# Set the initial view to PanamasetView(lng =-80.782127, lat =8.537981, zoom =7)ap_map
Warning in pal(access_points): Some values were outside the color scale and
will be treated as NA
Code
library(osmdata)
Warning: package 'osmdata' was built under R version 4.4.2
Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Code
library(sf)panama_bb <-getbb("Panama")highway <-opq(panama_bb) %>%add_osm_feature(key ="name", value ="Carretera Panamericana") %>%osmdata_sf()leaflet() %>%addProviderTiles(providers$CartoDB.DarkMatter) %>%addPolylines(data = highway$osm_lines, # Using the lines from your downloaded datacolor ="red",weight =2,opacity =0.8,popup =~name # This will show the road name when clicked ) %>%setView(lng =-80.782127, lat =8.537981, zoom =7)
Code
pal <-colorFactor(palette ="Set3",domain =unique(schools_data$type))leaflet() %>%# Add a base map (you can change the provider as we discussed)addProviderTiles(providers$CartoDB.DarkMatter) %>%# Add school points with clusteringaddCircleMarkers(data = schools_data,radius =6,color =~pal(type),fillOpacity =1,stroke =TRUE,weight =1,popup =~paste("<strong>School:</strong>", name, "<br>","<strong>Type:</strong>", type, "<br>","<strong>Province:</strong>", province, "<br>","<strong>District:</strong>", district, "<br>","<strong>County:</strong>", county ),clusterOptions =markerClusterOptions() ) %>%addLegend(position ="bottomright",pal = pal,values = schools_data$type,title ="School Type" ) %>%# Set the initial view to PanamasetView(lng =-80.782127, lat =8.537981, zoom =7)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Code
highway_districts <-unique(districts_with_highway$NAME_2)leaflet() %>%addProviderTiles(providers$CartoDB.DarkMatter) %>%# Add all districts in light grayaddPolygons(data = district_data,fillColor ="lightgray",fillOpacity =0.3,weight =1,color ="gray" ) %>%# Highlight districts with highway in blueaddPolygons(data = district_data[district_data$NAME_2 %in% highway_districts,],fillColor ="yellow",fillOpacity =0.3,weight =1,color ="blue",popup =~NAME_2 ) %>%# Add the highway in redaddPolylines(data = highway$osm_lines,color ="red",weight =2 ) %>%setView(lng =-80.782127, lat =8.537981, zoom =7)
Code
library(spdep)
Warning: package 'spdep' was built under R version 4.4.2
Loading required package: spData
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
Code
district_data_complete <-merge(district_data, district_summary, by.x ="NAME_2", by.y ="district")# 1. Create a neighbors list based on district boundaries# Queens case (districts sharing any boundary point)nb <-poly2nb(district_data_complete, queen =TRUE)
Warning in poly2nb(district_data_complete, queen = TRUE): some observations have no neighbours;
if this seems unexpected, try increasing the snap argument.
Warning in poly2nb(district_data_complete, queen = TRUE): neighbour object has 3 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.
Code
# 2. Create spatial weightsw <-nb2listw(nb, style ="W", zero.policy =TRUE)# Now calculate Moran's I with zero.policy=TRUEmoran_test <-moran.test(district_data_complete$access_points, w, zero.policy =TRUE)print(moran_test)
Moran I test under randomisation
data: district_data_complete$access_points
weights: w
n reduced by no-neighbour observations
Moran I statistic standard deviate = 4.7929, p-value = 8.218e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.229408902 -0.013888889 0.002576752
library(spdep)# 1. Access Points and Schools relationship# Standardize the variablesz_ap <-scale(district_data_complete$access_points)z_schools <-scale(district_data_complete$schools)# Calculate Moran's I for Access Points vs Schoolsmoran_ap_schools <-moran.test(district_data_complete$access_points, nb2listw(nb, style="W", zero.policy=TRUE), zero.policy=TRUE)print("Moran's I for Access Points vs Schools:")
[1] "Moran's I for Access Points vs Schools:"
Code
print(moran_ap_schools)
Moran I test under randomisation
data: district_data_complete$access_points
weights: nb2listw(nb, style = "W", zero.policy = TRUE)
n reduced by no-neighbour observations
Moran I statistic standard deviate = 4.7929, p-value = 8.218e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.229408902 -0.013888889 0.002576752
Code
# 2. Access Points and Density relationship# Standardize densityz_density <-scale(district_data_complete$density)# Calculate Moran's I for Access Points vs Densitymoran_ap_density <-moran.test(district_data_complete$density, nb2listw(nb, style="W", zero.policy=TRUE), zero.policy=TRUE)print("Moran's I for Access Points vs Density:")
[1] "Moran's I for Access Points vs Density:"
Code
print(moran_ap_density)
Moran I test under randomisation
data: district_data_complete$density
weights: nb2listw(nb, style = "W", zero.policy = TRUE)
n reduced by no-neighbour observations
Moran I statistic standard deviate = 5.813, p-value = 3.068e-09
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.1008972152 -0.0138888889 0.0003899232
Code
# Create visualization plotspar(mfrow=c(1,2)) # Set up a 1x2 plotting area# Plot for Access Points vs Schoolsmoran.plot(district_data_complete$access_points, nb2listw(nb, style="W", zero.policy=TRUE),labels=district_data_complete$NAME_2,xlab="Access Points",ylab="Spatially Lagged Access Points",main="Access Points vs Schools",zero.policy=TRUE)# Plot for Access Points vs Densitymoran.plot(district_data_complete$density, nb2listw(nb, style="W", zero.policy=TRUE),labels=district_data_complete$NAME_2,xlab="Population Density",ylab="Spatially Lagged Density",main="Access Points vs Density",zero.policy=TRUE)
Code
# Calculate ratio of access points to densitydistrict_data_complete$ap_per_density <- district_data_complete$access_points / district_data_complete$density# Calculate Moran's Imoran_density <-moran.test(district_data_complete$ap_per_density, w, zero.policy =TRUE)print("Moran's I for Access Points relative to Population Density:")
[1] "Moran's I for Access Points relative to Population Density:"
Code
print(moran_density)
Moran I test under randomisation
data: district_data_complete$ap_per_density
weights: w
n reduced by no-neighbour observations
Moran I statistic standard deviate = 4.3008, p-value = 8.51e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.208806834 -0.013888889 0.002681206
Code
# Calculate ratio of access points to schoolsdistrict_data_complete$ap_per_school <- district_data_complete$access_points / district_data_complete$schools# Calculate Moran's Imoran_schools <-moran.test(district_data_complete$ap_per_school, w, zero.policy =TRUE)print("Moran's I for Access Points relative to Schools:")
[1] "Moran's I for Access Points relative to Schools:"
Code
print(moran_schools)
Moran I test under randomisation
data: district_data_complete$ap_per_school
weights: w
n reduced by no-neighbour observations
Moran I statistic standard deviate = 3.1227, p-value = 0.0008959
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.229109922 -0.013888889 0.006055367
Code
# Set up to display two Moran plots side by sidepar(mfrow=c(1,2))# Plot for Access Points per Densitymoran.plot(district_data_complete$ap_per_density, w,labels=district_data_complete$NAME_2,xlab="Access Points per Density",ylab="Spatially Lagged AP per Density",main="Access Points per Density",zero.policy=TRUE)# Plot for Access Points per Schoolmoran.plot(district_data_complete$ap_per_school, w,labels=district_data_complete$NAME_2,xlab="Access Points per School",ylab="Spatially Lagged AP per School",main="Access Points per School",zero.policy=TRUE)
Code
# Reset plotting parameterspar(mfrow=c(1,1))# Let's also create choropleth maps to visualize these ratios# Map for AP per Densityleaflet() %>%addProviderTiles(providers$CartoDB.Positron) %>%# First map: AP per DensityaddPolygons(data = district_data_complete,fillColor =~colorNumeric("viridis", ap_per_density)(ap_per_density),fillOpacity =0.7,weight =1,color ="white",popup =~paste("<strong>District:</strong>", NAME_2,"<br><strong>AP per Density:</strong>", round(ap_per_density, 4) ) )
Code
# Map for AP per Schoolleaflet() %>%addProviderTiles(providers$CartoDB.Positron) %>%# Second map: AP per SchooladdPolygons(data = district_data_complete,fillColor =~colorNumeric("viridis", ap_per_school)(ap_per_school),fillOpacity =0.7,weight =1,color ="white",popup =~paste("<strong>District:</strong>", NAME_2,"<br><strong>AP per School:</strong>", round(ap_per_school, 4) ) )
Code
library(spatstat)
Warning: package 'spatstat' was built under R version 4.4.2
Loading required package: spatstat.data
Warning: package 'spatstat.data' was built under R version 4.4.2
Loading required package: spatstat.univar
Warning: package 'spatstat.univar' was built under R version 4.4.2
spatstat.univar 3.1-1
Loading required package: spatstat.geom
Warning: package 'spatstat.geom' was built under R version 4.4.2
spatstat.geom 3.3-4
Loading required package: spatstat.random
Warning: package 'spatstat.random' was built under R version 4.4.2
spatstat.random 3.3-2
Loading required package: spatstat.explore
Warning: package 'spatstat.explore' was built under R version 4.4.2
Loading required package: nlme
Attaching package: 'nlme'
The following object is masked from 'package:dplyr':
collapse
spatstat.explore 3.3-3
Loading required package: spatstat.model
Warning: package 'spatstat.model' was built under R version 4.4.2
Loading required package: rpart
spatstat.model 3.3-3
Loading required package: spatstat.linnet
Warning: package 'spatstat.linnet' was built under R version 4.4.2
spatstat.linnet 3.2-3
spatstat 3.3-0
For an introduction to spatstat, type 'beginner'
Code
library(sf)# First, transform both datasets to UTM Zone 17N (EPSG:32617)schools_projected <-st_transform(schools_data, 32617)ap_projected <-st_transform(ap_data, 32617)district_projected <-st_transform(district_data, 32617)# Create the observation windowwindow <-as.owin(st_union(district_projected))# Create point pattern objects for both schools and access pointsschools_coords <-st_coordinates(schools_projected)ap_coords <-st_coordinates(ap_projected)# Create a marked point pattern with both types of pointspoints_combined <-ppp(x =c(schools_coords[,1], ap_coords[,1]),y =c(schools_coords[,2], ap_coords[,2]),marks =factor(c(rep("school", nrow(schools_coords)),rep("ap", nrow(ap_coords)))),window = window)
Warning: 49 points were rejected as lying outside the specified window
Warning: data contain duplicated points
Code
# Calculate cross-type pair correlation functioncross_pcf <-pcfcross(points_combined, "school", "ap", correction ="translate")# Plot the cross-type pair correlation functionplot(cross_pcf, main ="Cross-type PCF: Schools vs Access Points",xlab ="Distance (m)", ylab ="g(r)")abline(h =1, lty =2) # Add reference line at g(r) = 1
Code
# Calculate cross-type K functioncross_K <-Kcross(points_combined, "school", "ap", correction ="border")# Plot the cross-type K functionplot(cross_K, main ="Cross-type K function: Schools vs Access Points",xlab ="Distance (m)")
Code
# Calculate cross-type L functioncross_L <-Lcross(points_combined, "school", "ap", correction ="border")# Plot the cross-type L functionplot(cross_L, main ="Cross-type L function: Schools vs Access Points",xlab ="Distance (m)")
Code
library(sf)# First project to UTM for accurate buffer creationhighway_projected <-st_transform(highway$osm_lines, 32617) # UTM 17Nap_projected <-st_transform(ap_data, 32617)# Create buffers in UTMbuffer_5km <-st_buffer(highway_projected, 5000)buffer_10km <-st_buffer(highway_projected, 10000)# Transform everything back to WGS84 for mappinghighway_wgs84 <-st_transform(highway_projected, 4326)ap_wgs84 <-st_transform(ap_projected, 4326)buffer_5km_wgs84 <-st_transform(buffer_5km, 4326)buffer_10km_wgs84 <-st_transform(buffer_10km, 4326)# Create the mapleaflet() %>%addProviderTiles(providers$CartoDB.Positron) %>%# Add the buffersaddPolygons(data = buffer_10km_wgs84,fillColor ="yellow",fillOpacity =0.2,color ="yellow",weight =1 ) %>%addPolygons(data = buffer_5km_wgs84,fillColor ="blue",fillOpacity =0.2,color ="blue",weight =1 ) %>%# Add the highwayaddPolylines(data = highway_wgs84,color ="black",weight =2 ) %>%# Add access pointsaddCircleMarkers(data = ap_wgs84,radius =3,color ="red",fillOpacity =0.7 ) %>%# Set the view to PanamasetView(lng =-80.782127, lat =8.537981, zoom =7)